Network-based auto-probit modeling for protein function prediction.
نویسندگان
چکیده
Predicting the functional roles of proteins based on various genome-wide data, such as protein-protein association networks, has become a canonical problem in computational biology. Approaching this task as a binary classification problem, we develop a network-based extension of the spatial auto-probit model. In particular, we develop a hierarchical Bayesian probit-based framework for modeling binary network-indexed processes, with a latent multivariate conditional autoregressive Gaussian process. The latter allows for the easy incorporation of protein-protein association network topologies-either binary or weighted-in modeling protein functional similarity. We use this framework to predict protein functions, for functions defined as terms in the Gene Ontology (GO) database, a popular rigorous vocabulary for biological functionality. Furthermore, we show how a natural extension of this framework can be used to model and correct for the high percentage of false negative labels in training data derived from GO, a serious shortcoming endemic to biological databases of this type. Our method performance is evaluated and compared with standard algorithms on weighted yeast protein-protein association networks, extracted from a recently developed integrative database called Search Tool for the Retrieval of INteracting Genes/proteins (STRING). Results show that our basic method is competitive with these other methods, and that the extended method-incorporating the uncertainty in negative labels among the training data-can yield nontrivial improvements in predictive accuracy.
منابع مشابه
Statistical Methods for the Analysis of Network Data
14:00-14:45 " Network-based auto-probit modeling with application to protein function prediction " Eric D. Kolaczyk 09:35-10:10 " Quantifying and comparing complexity of cellular networks: structure beyond degree statistics " Alessia Annibale and Anthony Coolen 10:10-10:45 " Node and link roles in protein-protein interaction networks " " Using Distinct Aspects of Social Network Analysis to Impr...
متن کاملProbit-Based Traffic Assignment: A Comparative Study between Link-Based Simulation Algorithm and Path-Based Assignment and Generalization to Random-Coefficient Approach
Probabilistic approach of traffic assignment has been primarily developed to provide a more realistic and flexible theoretical framework to represent traveler’s route choice behavior in a transportation network. The problem of path overlapping in network modelling has been one of the main issues to be tackled. Due to its flexible covariance structure, probit model can adequately address the pro...
متن کاملLink Prediction using Network Embedding based on Global Similarity
Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...
متن کاملA comparison of different network based modeling methods for prediction of the torque of a SI engine equipped with variable valve timing
Nowadays, due to increasing the complexity of IC engines, calibration task becomes more severe and the need to use surrogate models for investigating of the engine behavior arises. Accordingly, many black box modeling approaches have been used in this context among which network based models are of the most powerful approaches thanks to their flexible structures. In this paper four network base...
متن کاملANN Based Modeling for Prediction of Evaporation in Reservoirs (RESEARCH NOTE)
This paper is an attempt to assess the potential and usefulness of ANN based modeling for evaporation prediction from a reservoir, where in classical and empirical equations failed to predict the evaporation accurately. The meteorological data set of daily pan evaporation, temperature, solar radiation, relative humidity, wind speed is used in this study. The performance of feed forward back pro...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Biometrics
دوره 67 3 شماره
صفحات -
تاریخ انتشار 2011